A senone based confidence measure for speech recognition
نویسندگان
چکیده
This paper describes three experiments in using frame level observation probabilities as the basis for word confidence annotation in an HMM speech recognition system. One experiment is at the word level, one uses word classes, and the other uses phone classes. In each experiment we categorize hypotheses into correct and incorrect categories by aligning a best recognition hypothesis with the known transcript. The confidence of error prediction for each class is a measure of the resolvability between the correct and incorrect histograms.
منابع مشابه
Vocabulary-independent word confidence measure using subword features
This paper discusses how to compute word-level confidence measures based on sub-word features for large-vocabulary speaker-independent speech recognition. The performance of confidence measure using features at word, phone and senone level is experimentally studied. A framework of transformation function based system using sub-word features is proposed for high performance confidence estimation...
متن کاملLearning small-size DNN with output-distribution-based criteria
Deep neural network (DNN) obtains significant accuracy improvements on many speech recognition tasks and its power comes from the deep and wide network structure with a very large number of parameters. It becomes challenging when we deploy DNN on devices which have limited computational and storage resources. The common practice is to train a DNN with a small number of hidden nodes and a small ...
متن کاملRapid adaptation for deep neural networks through multi-task learning
We propose a novel approach to addressing the adaptation effectiveness issue in parameter adaptation for deep neural network (DNN) based acoustic models for automatic speech recognition by adding one or more small auxiliary output layers modeling broad acoustic units, such as mono-phones or tied-state (often called senone) clusters. In scenarios with a limited amount of available adaptation dat...
متن کاملPredicting unseen triphones with senones
In large-vocabulary speech recognition, the decoder often encounters triphones that are not covered in the training data. These unseen triphones are usually represented by corresponding diphones or context independent monophones. We propose to use decision-tree based senones to generate needed senonic baseforms for unseen triphones. A decision tree is built for each individual Markov state of e...
متن کاملEfficient data selection for speech recognition based on prior confidence estimation using speech and context independent models
This paper proposes an efficient data selection technique to identify well recognized texts in massive volumes of speech data. Conventional confidence measure techniques can be used to obtain this accurate data, but they require speech recognition results to estimate confidence. Without a significant level of confidence, considerable computer resources are wasted since inaccurate recognition re...
متن کامل